Document Classification
نویسندگان
چکیده
Keywords can be used as attributes for mining rules or a basis measuring the similarity of new (unclassified) documents with existing (classified) ones. The focus is on problem extracting keywords from document collection in order to use them classification. Document classification hot topic machine learning. Typical approaches extract “features,” generally words, document, and feature vectors input learning scheme that learns how classify documents. This “bag keywords” model neglects keyword contextual effects.
منابع مشابه
A New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملClassification DOCUMENT CONTROL DATA
Field trials of the LTS-3 system at Keesler Air Force Base have been extended, and excellent results have been obtained with high-aptitude students, who had been excluded from earlier trials. A study of the use of the LTS for task simulation has led to the implementation of a new student response interpretation feature for the system. Design of the microfiche selector/reader breadboard for LTS-...
متن کاملTion for Document Classification
The bag-of-words (BOW) model is the common approach for classifying documents, where words are used as feature for training a classifier. This generally involves a huge number of features. Some techniques, such as Latent Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA), have been designed to summarize documents in a lower dimension with the least semantic information loss. Some sema...
متن کاملIntelligent document classification
In this work we investigate some technical questions related to the application of neural networks in document classification. First, we discuss the effects of different averaging protocols for the 2 statistic used to remove non-informative terms. This is an especially relevant issue for the neural network technique, which requires an aggressive dimensionality reduction to be feasible. Second, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Advances in data mining and database management book series
سال: 2021
ISSN: ['2327-199X', '2327-1981']
DOI: https://doi.org/10.4018/978-1-7998-3772-5.ch007